# ExID : Boosting Offline RL with Expert Insights in Data-Scarce Settings

This is the code contribution for boosting offline RL performance on scarce data using domain knowledge

# System Requirements

The code has been tested in systems with the following OS

- Ubuntu 20.04.2 LTS

# Dependencies:

To reproduce all results we provide an environment.yml file to setup a conda environment with the required packages. Run the following command to create and activate the environment:

```
$ conda create --name exidenv --file requirements.txt
$ conda activate exidenv
$ pip install -e .

```
or 

```
$ conda create --name exidenv python=3.7.0
$ conda activate exidenv
$ python -m pip install -r piprequirements.txt
```

# Data Creation and processing:

We create all dataset from https://github.com/kschweig/OfflineRL and use run3 dataset for our experiments.
To create dataset 
1. Clone and install dependencies for https://github.com/kschweig/OfflineRL
2. run expnumber.py --online ex for mountain car env : ex02.py --online
3. process the data into our buffer type : run python process.py --exp_name "path of dataset"

We have created additional experiment files for Lunar-Lander ex_07.py and Acrobot ex_08.py provided under experiment folder

# Hyperparameter configurations

Hyperparameter configuration are listed for each environment under the config folder for each environment (Important hyperparameters are listed below)

1. episodes : number of episodes to train on
2. seed : default 1 the experiments have been conducted on seed 1, 42,76
3. data_file : Data file name ex : data/Mountain_car_expertRun3.pkl
4. data_type : Type of data file er : expert rep : replay ns : noisy
5. data_percent : Percentage of data to be trained on full buffer default 0.1
6. use_heur: Set to True when evaluating baseline with domain knowledge else False
7. use_teach : Set to True when training with teacher network and False when training baseline CQL
8. warm_start : The warm start parameter where the student only learns from teacher
9. teacher_update : The episode interval at which the teacher should update
10. lam : lambda value giving importance to the regularizer default 0.5
11. algo_type : Type of algo for baseline supported types  [QRDQN, REM, BVE, CRR, MCE, BC, BCQ]

# Constructing Teacher actor using BC and expert policy

The code for constructing teacher policy is avialable in ConstructingTeacherActorusingBC.ipynb 
Random data from run3 has been used as states for the teacher policy

# Training the baseline algorithm

To train baselines and evaluate with domain knowledge set use_heur: True algo_type: Algorithm(QRDQN, REM, BVE, CRR, MCE, BC, BCQ) in config file
Run the following python file with required configuration 

```
$  python train_baseline.py --config_file "config/mountain.config"

```
To train baseline CQL set use_heur: True, use_teach : False in config file and run
```
$  python train_baseline.py --config_file "config/mountain.config"

```

# Training the environment with EXID

To train an EXID agent set use_heur: False, use_teach: True, algo_type: ExID in config file
Run the following python file with required configuration 

```
$  python train.py --config_file "config/mountain.config"

```

# For the Sales Promotion Enviroment Expirements

Install and setup NeORL environments following https://github.com/polixir/NeoRL/tree/benchmark
Install CORL libraries using https://github.com/tinkoff-ai/CORL
Run python exidsp.py


# Data distribution and data coverage plots

All data distribution and coverage plots can be generated using Plotting State Distribution and State action coverage.ipynb